{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Special case: evaporation of water\n", "**TL;DR:** Estimation of rates and yields is an important step in characterization and selection of cell factories and bioprocess. Evaporation of water from the bioreactor will bias the estimated yields and rates. Using the Pseudo batch transformation one can estimate the correct values when evaporation is significant.\n", "\n", "## Why care about evaporation?\n", "Evaporation of water from a bioreactor will bias the estimated titer, rates and yields if it is not properly accounted for. As a motivating example, imagine a batch bioreactor with an initial volume of 100 ml. When analysing batch processes it is common to assume that any change in concentration is caused by the metabolic activity of the organism in the reactor. However, if 10 ml water evaporated during the process this would cause a significant increase in concentration which is unrelated to the metabolic activity. As result the production rates will appear larger causing the strains to look more attractive, than reality.\n", "\n", "Evaporation can vary a lot between different bioreactor equipment and scales, thus evaporation could very well take a part of the blame for the performance variation of cell factories at different scales. The cause of the varying evaporation is multifaceted and can originate from dryness of process air, presence of condensers, differences in agitation method, etc [1]. Thus, the same strain can appear to perform differently in between cultivation setups if evaporation is not accounted for in the bioreactor volume.\n", "\n", "For fed-batch fermentations, evaporation adds an additional source of volume change. The Pseudo batch transformation can handle scenarios where evaporation is significant, however to get accurate estimates of rates and yields the transformation method needs the true volume of the bioreactor. The true volume data may not always be accessible. This tutorial have two objectives\n", "1. Show that the Pseudo batch transformation method works when water evaporation is significant \n", "2. Illustrate a common scenario where the evaporation is not accounted for in the volume data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we start we will first load the necessary Python packages and functions." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "\n", "from pseudobatch import pseudobatch_transform_pandas\n", "from pseudobatch.datasets import load_evaporation_fedbatch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pseudo batch transformation works when evaporation is significant\n", "The `pseudobatch` package holds an example dataset which is produced to mimic a fed-batch process with an exponential feeding profile where water evaporates from the bioreactor at a constant rate (The simulation script are found in https://github.com/biosustain/pseudobatch/article/simulation_scripts/fed-batch_evaporation.jl). This data is loaded in the code block below. We also show some of the data columns to give an overview of the structure of this data frame." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
timestampsample_volumec_Glucosec_Biomassc_Productc_CO2v_Volumev_Feed_accum
00.000000.00.0750000.5000000.0000000.01000.0000000.000000
10.060060.00.0750040.5030140.0024740.0999.9957040.055765
20.120120.00.0750090.5060470.0049640.0999.9917450.111865
30.180180.00.0750130.5090970.0074690.0999.9881230.168303
40.240240.00.0750160.5121660.0099880.0999.9848420.225082
\n", "
" ], "text/plain": [ " timestamp sample_volume c_Glucose c_Biomass c_Product c_CO2 \n", "0 0.00000 0.0 0.075000 0.500000 0.000000 0.0 \\\n", "1 0.06006 0.0 0.075004 0.503014 0.002474 0.0 \n", "2 0.12012 0.0 0.075009 0.506047 0.004964 0.0 \n", "3 0.18018 0.0 0.075013 0.509097 0.007469 0.0 \n", "4 0.24024 0.0 0.075016 0.512166 0.009988 0.0 \n", "\n", " v_Volume v_Feed_accum \n", "0 1000.000000 0.000000 \n", "1 999.995704 0.055765 \n", "2 999.991745 0.111865 \n", "3 999.988123 0.168303 \n", "4 999.984842 0.225082 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fedbatch_df = load_evaporation_fedbatch()\n", "fedbatch_df[['timestamp','sample_volume', 'c_Glucose', 'c_Biomass','c_Product', 'c_CO2', 'v_Volume', 'v_Feed_accum']].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To show the effect of the evaporation, we will calculate the expected volume if no volume evaporated. This is simply done by starting with the initial reactor volume, then adding the volume of feed and subtracting the sample volume. Below we calculate the expected volume and store it in a column of the dataframe." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "initial_volume = fedbatch_df['v_Volume'].iloc[0]\n", "\n", "# Calculated the expected volume by tracking known volume changing \n", "# events. For the Pseudo batch transform the volume has to be the \n", "# volume BEFORE sampling. Therefore the sampling volume data is \n", "# shifted one position before the cumsum.\n", "fedbatch_df['expected_volume'] = (\n", " initial_volume \n", " + fedbatch_df['v_Feed_accum']\n", " - fedbatch_df['sample_volume'].shift(1, fill_value=0).cumsum()\n", ")\n", "\n", "fedbatch_df.plot(\n", " x='timestamp',\n", " y=['v_Volume', 'expected_volume'],\n", " label=['True volume', \"Expected volume\"],\n", " color=['C0', 'C2']\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This plot clearly shows that the expected volume (orange line) diverges from the true volume (blue line) of the bioreactor. This divergence is caused by volume evaporation. To estimate the correct rates and yields using the Pseudo batch transformation the true bioreactor volume is required. In real world scenarios the true volume may not be known, this is discussed further in the next section. For now we will use the simulated true volume to validate the Pseudo batch transformation method.\n", "\n", "In the following code block the Pseudo batch transformed biomass and glucose concentrations are calculated.\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "substrate_in_feed = fedbatch_df['s_f'].iloc[0]\n", "fedbatch_df[['pseudo_Biomass', 'pseudo_Glucose']] = pseudobatch_transform_pandas(\n", " df=fedbatch_df,\n", " measured_concentration_colnames=[\"c_Biomass\", \"c_Glucose\"],\n", " reactor_volume_colname='v_Volume',\n", " accumulated_feed_colname='v_Feed_accum',\n", " concentration_in_feed=[0,substrate_in_feed],\n", " sample_volume_colname='sample_volume'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the standard linear modelling procedures to estimate the specific growth rate and the substrate yield coefficient from the Pseudo batch transformed data." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mu0_hat = 0.1\n", "true mu0 = 0.1\n" ] } ], "source": [ "mu0_hat, intercept = np.polyfit(fedbatch_df['timestamp'], fedbatch_df['pseudo_Biomass'].transform(np.log), 1)\n", "print(f\"mu0_hat = {round(mu0_hat, 5)}\")\n", "print(f\"true mu0 = {fedbatch_df['mu0'].iloc[-1]}\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Yxs_hat = 1.8500000000000008\n", "true Yxs = 1.85\n" ] } ], "source": [ "Yxs_hat, intercept = np.polyfit(fedbatch_df['pseudo_Biomass'], fedbatch_df['pseudo_Glucose'], 1)\n", "print(f\"Yxs_hat = {abs(Yxs_hat)}\")\n", "print(f\"true Yxs = {fedbatch_df['Yxs'].iloc[-1]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that both of the estimated parameters match the true simulated parameters, thus showing that the Pseudo batch transformation can be used in situations where the evaporation is present." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What happens when volume data does not account for evaporation\n", "In some cultivation systems the volume is not actually measured. This is the case for the M2labs Biolector and Robolector cultivation systems and the Sartorius Ambr® 15 and Ambr® 250 systems. Instead the reactor volume is inferred by keeping track of the liquid going in and out of the reactor. This method does not account for the evaporation of water from the system and will overestimate the actual volume of the reactor. In the simulated example used above, this would mean that the volume time series data outputted from the instrument would be the \"Expected volume\" and NOT the \"True volume\" (See plot above).\n", "\n", "Let's investigate what happens if we estimate the specific growth rate and substrate yield using the biassed volume data.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "fedbatch_df[['pseudo_Biomass_wrong_volume', 'pseudo_Glucose_wrong_volume']] = pseudobatch_transform_pandas(\n", " df=fedbatch_df,\n", " measured_concentration_colnames=[\"c_Biomass\", \"c_Glucose\"],\n", " reactor_volume_colname='expected_volume',\n", " accumulated_feed_colname='v_Feed_accum',\n", " concentration_in_feed=[0,substrate_in_feed],\n", " sample_volume_colname='sample_volume'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the regular linear model, we can estimate the rate and yield from the Pseudo batch concentration based on the wrong volume." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mu0_hat = 0.09959\n", "true mu0 = 0.1\n" ] } ], "source": [ "mu0_hat, intercept = np.polyfit(fedbatch_df['timestamp'], fedbatch_df['pseudo_Biomass_wrong_volume'].transform(np.log), 1)\n", "print(f\"mu0_hat = {round(mu0_hat, 5)}\")\n", "print(f\"true mu0 = {fedbatch_df['mu0'].iloc[-1]}\")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Yxs_hat = 1.6852002355943905\n", "true Yxs = 1.85\n" ] } ], "source": [ "Yxs_hat, intercept = np.polyfit(fedbatch_df['pseudo_Biomass_wrong_volume'], fedbatch_df['pseudo_Glucose_wrong_volume'], 1)\n", "print(f\"Yxs_hat = {abs(Yxs_hat)}\")\n", "print(f\"true Yxs = {fedbatch_df['Yxs'].iloc[-1]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we see that both the specific growth rate and the substrate yield estimates are biased. However, the bias of the growth rate is very small. It is important to note that the size of these biases are dependent on the specific evaporation rate, bioprocess, strain, etc. but the bottom line is that there will be a bias." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To improve the estimated rates and yields, we need to estimate the true bioreactor volume. To estimate the true bioreactor volume we need a function which can calculate/estimate how much water has evaporated from the bioreactor at each time point. Below we will implement a Python function to calculate the amount of volume evaporated. For this illustration case, we will assume that the water evaporates at a constant rate. Therefore the accumulated evaporation over a given time period is calculated as\n", "\n", "$$\n", "V_{evaporated} = (t_1 - t_0) * evaporation\\_rate\n", "$$\n", "\n", "\n", "Here the evaporated volume of water ($V_{evaporated}$) is a function of three parameters: the initial timepoint ($t_0$), the final time point ($t_1$) and the evaporation rate ($evaporation\\_rate$). One could easily imagine modelling the evaporation rate as a function of other factors such as temperature, stirring speed, etc. We leave it up to the user to decide how they will model evaporation in their setup.\n", "\n", "**IMPORTANT NOTE:** I, the authors of this tutorial, am not an expert in evaporation, and thus this evaporation function is purely for the sake of the example. We do not recommend using this function for volume correction, instead please refer to the literature on this topic.\n", "\n", "Now we implement the evaporation function described above.\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def evaporated_volume(t1, t0, evaporation_rate):\n", " \"\"\"Calculates the liquid phase volume of evaporated water during a \n", " timespan t0 to t1, assuming that the water evaporates at a known \n", " fixed rate.\"\"\"\n", " return (t1 - t0) * evaporation_rate " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As this is a simulated data set we know the true evaporation rate, which is stored in the `evap_rate` column of the data frame." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Simulated evaporation rate 1.0 µL/h\n" ] } ], "source": [ "simulated_evaporation_rate = fedbatch_df.evap_rate.iloc[0]\n", "print(f\"Simulated evaporation rate {simulated_evaporation_rate} µL/h\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will estimate the volume of evaporated water for each time step using the function we defined above." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "fedbatch_df['estimated_evaporated_volume'] = fedbatch_df['timestamp'].apply(evaporated_volume, t0=0, evaporation_rate=simulated_evaporation_rate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we can estimate the true volume of the bioreactor by subtracting the evaporated volume from the \"measured\" volume." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "fedbatch_df['estimated_true_volume'] = fedbatch_df['expected_volume'] - fedbatch_df['estimated_evaporated_volume']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can visualize the \"measured\", estimated and the true volume to see that the estimated true volume is similar to the true volume." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fedbatch_df.plot(\n", " x='timestamp',\n", " y=['expected_volume','v_Volume', 'estimated_true_volume'],\n", " label=[\"Expected volume\", 'True volume', \"Estimated true volume\"],\n", " style=['-', '-', '--'],\n", " color=['C2', 'C0', 'C1']\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the estimated true volume (green dashed line) follows the true volume (blue solid line) showing that we successfully estimated the true volume. In a real world scenario the true volume will rarely be known and one will have to live with the uncertainty that this introduces into the rate and yield estimates.\n", "\n", "To make the example complete we will estimate the specific growth rate and substrate yield using the estimated true volume for the Pseudo batch transformation." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "fedbatch_df[['pseudo_Biomass_estimated_volume', 'pseudo_Glucose_estimated_volume']] = pseudobatch_transform_pandas(\n", " df=fedbatch_df,\n", " measured_concentration_colnames=[\"c_Biomass\", \"c_Glucose\"],\n", " reactor_volume_colname='estimated_true_volume',\n", " accumulated_feed_colname='v_Feed_accum',\n", " concentration_in_feed=[0,substrate_in_feed],\n", " sample_volume_colname='sample_volume'\n", ")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mu0_hat = 0.1\n", "true mu0 = 0.1\n" ] } ], "source": [ "mu0_hat, intercept = np.polyfit(fedbatch_df['timestamp'], fedbatch_df['pseudo_Biomass_estimated_volume'].transform(np.log), 1)\n", "print(f\"mu0_hat = {round(mu0_hat, 5)}\")\n", "print(f\"true mu0 = {fedbatch_df['mu0'].iloc[-1]}\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Yxs_hat = 1.850000000000001\n", "true Yxs = 1.85\n" ] } ], "source": [ "Yxs_hat, intercept = np.polyfit(fedbatch_df['pseudo_Biomass_estimated_volume'], fedbatch_df['pseudo_Glucose_estimated_volume'], 1)\n", "print(f\"Yxs_hat = {abs(Yxs_hat)}\")\n", "print(f\"true Yxs = {fedbatch_df['Yxs'].iloc[-1]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected the calculated rate and yield match the simulated parameters. This high accuracy is of course only possible because we here work with simulated data and know the true evaporation rate." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "During this tutorial, we have shown that the Pseudo batch transformation is applicable, when significant amounts of water evaporate from the bioreactor. Furthermore, the tutorial highlighted some of the considerations concerning evaporation from bioreactors and illustrated an overall approach to account for water evaporation when applying the Pseudo batch transformation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "[1] M. Ask and S. M. Stocks, “Aerobic bioreactors: condensers, evaporation rates, scale-up and scale-down,” Biotechnol Lett, vol. 44, no. 7, pp. 813–822, Jul. 2022, doi: 10.1007/s10529-022-03258-7." ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "pseudobatch-dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }